{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FN7k9-TsMICZ"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "FNJDzmhEMJxP"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aPVGKX1CDwk6"
      },
      "source": [
        "# TFDS and determinism"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gLgkbSCbTHGT"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/datasets/determinism\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/determinism.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/datasets/blob/master/docs/determinism.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/datasets/docs/determinism.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hxyk-aykTMBQ"
      },
      "source": [
        "This document explains:\n",
        "\n",
        "* The TFDS guarantees on determinism\n",
        "* In which order does TFDS read examples\n",
        "* Various caveats and gotchas\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RvSNu11KPL1l"
      },
      "source": [
        "## Setup\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5Ho-Btn6CRpM"
      },
      "source": [
        "### Datasets\n",
        "\n",
        "Some context is needed to understand how TFDS reads the data.\n",
        "\n",
        "During generation, TFDS write the original data into standardized `.tfrecord` files. For big datasets, multiple `.tfrecord` files are created, each containing multiple examples. We call each `.tfrecord` file a **shard**.\n",
        "\n",
        "This guide uses imagenet which has 1024 shards:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5uWx_PnYB_OO"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "imagenet has 1024 shards (1281167 examples)\n"
          ]
        }
      ],
      "source": [
        "import re\n",
        "import tensorflow_datasets as tfds\n",
        "\n",
        "imagenet = tfds.builder('imagenet2012')\n",
        "\n",
        "num_shards = imagenet.info.splits['train'].num_shards\n",
        "num_examples = imagenet.info.splits['train'].num_examples\n",
        "print(f'imagenet has {num_shards} shards ({num_examples} examples)')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QXwzaoLkD3vl"
      },
      "source": [
        "### Finding the dataset examples ids\n",
        "\n",
        "You can skip to the following section if you only want to know about determinism.\n",
        "\n",
        "Each dataset example is uniquely identified by an `id` (e.g. `'imagenet2012-train.tfrecord-01023-of-01024__32'`). You can recover this\n",
        "`id` by passing `read_config.add_tfds_id = True` which will add a `'tfds_id'` key in the dict from the `tf.data.Dataset`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ud9H2rr4R5g0"
      },
      "source": [
        "In this tutorial, we define a small util which will print the example ids of the dataset (converted in integer to be more human-readable):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "wnybvfFAB2QZ"
      },
      "outputs": [],
      "source": [
        "#@title\n",
        "\n",
        "def load_dataset(builder, **as_dataset_kwargs):\n",
        "  \"\"\"Load the dataset with the tfds_id.\"\"\"\n",
        "  read_config = as_dataset_kwargs.pop('read_config', tfds.ReadConfig())\n",
        "  read_config.add_tfds_id = True  # Set `True` to return the 'tfds_id' key\n",
        "  return builder.as_dataset(read_config=read_config, **as_dataset_kwargs)\n",
        "\n",
        "def print_ex_ids(\n",
        "    builder,\n",
        "    *,\n",
        "    take: int,\n",
        "    skip: int = None,\n",
        "    **as_dataset_kwargs,\n",
        ") -\u003e None:\n",
        "  \"\"\"Print the example ids from the given dataset split.\"\"\"\n",
        "  ds = load_dataset(builder, **as_dataset_kwargs)\n",
        "  if skip:\n",
        "    ds = ds.skip(skip)\n",
        "  ds = ds.take(take)\n",
        "  exs = [ex['tfds_id'].numpy().decode('utf-8') for ex in ds]\n",
        "  exs = [id_to_int(tfds_id, builder=builder) for tfds_id in exs]\n",
        "  print(exs)\n",
        "\n",
        "def id_to_int(tfds_id: str, builder) -\u003e str:\n",
        "  \"\"\"Format the tfds_id in a more human-readable.\"\"\"\n",
        "  match = re.match(r'\\w+-(\\w+).\\w+-(\\d+)-of-\\d+__(\\d+)', tfds_id)\n",
        "  split_name, shard_id, ex_id = match.groups()\n",
        "  split_info = builder.info.splits[split_name]\n",
        "  return sum(split_info.shard_lengths[:int(shard_id)]) + int(ex_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OuB1fVkMThfc"
      },
      "source": [
        "## Determinism when reading\n",
        "\n",
        "This section explains deterministim guarantee of `tfds.load`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IUQnKzMfCKhr"
      },
      "source": [
        "### With `shuffle_files=False` (default)\n",
        "\n",
        "By default TFDS yield examples deterministically (`shuffle_files=False`)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "I2DS1cIXCnRv"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1251, 1252, 1253, 1254]\n",
            "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1251, 1252, 1253, 1254]\n"
          ]
        }
      ],
      "source": [
        "# Same as: imagenet.as_dataset(split='train').take(20)\n",
        "print_ex_ids(imagenet, split='train', take=20)\n",
        "print_ex_ids(imagenet, split='train', take=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SOTdwzguYRua"
      },
      "source": [
        "For performance, TFDS read multiple shards at the same time using [tf.data.Dataset.interleave](https://www.tensorflow.org/api_docs/python/tf/data/Dataset?version=nightly#interleave). We see in this example that TFDS switch to shard 2 after reading 16 examples (`..., 14, 15, 1251, 1252, ...`). More on interleave bellow."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mm74ZShHDLaD"
      },
      "source": [
        "Similarly, the subsplit API is also deterministic:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Sy2ZbVrIDPjL"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[858382, 858383, 858384, 858385, 858386, 858387, 858388, 858389, 858390, 858391, 858392, 858393, 858394, 858395, 858396, 858397, 859533, 859534, 859535, 859536]\n",
            "[858382, 858383, 858384, 858385, 858386, 858387, 858388, 858389, 858390, 858391, 858392, 858393, 858394, 858395, 858396, 858397, 859533, 859534, 859535, 859536]\n"
          ]
        }
      ],
      "source": [
        "print_ex_ids(imagenet, split='train[67%:84%]', take=20)\n",
        "print_ex_ids(imagenet, split='train[67%:84%]', take=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vTz1KewrEFbl"
      },
      "source": [
        "If you're training for more than one epoch, the above setup is not recommended as all epochs will read the shards in the same order (so randomness is limited to the `ds = ds.shuffle(buffer)` buffer size)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y-VHVi3RDdBf"
      },
      "source": [
        "### With `shuffle_files=True`\n",
        "\n",
        "With `shuffle_files=True`, shards are shuffled for each epoch, so reading is not deterministic anymore."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NdUzVeYyFUD9"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[568017, 329050, 329051, 329052, 329053, 329054, 329056, 329055, 568019, 568020, 568021, 568022, 568023, 568018, 568025, 568024, 568026, 568028, 568030, 568031]\n",
            "[43790, 43791, 43792, 43793, 43796, 43794, 43797, 43798, 43795, 43799, 43800, 43801, 43802, 43803, 43804, 43805, 43806, 43807, 43809, 43810]\n"
          ]
        }
      ],
      "source": [
        "print_ex_ids(imagenet, split='train', shuffle_files=True, take=20)\n",
        "print_ex_ids(imagenet, split='train', shuffle_files=True, take=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gAJTLLsuFeuP"
      },
      "source": [
        "Note: Setting `shuffle_files=True` also [disable](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_builder.py?l=676\u0026rcl=354322021)  `deterministic` in [`tf.data.Options`](https://www.tensorflow.org/api_docs/python/tf/data/Options) to give some performance boost. So even small datasets which only have a single shard (like mnist), become non-deterministic.\n",
        "\n",
        "See recipe below to get deterministic file shuffling."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zDg18upoKFX0"
      },
      "source": [
        "### Determinism caveat: interleave args"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C4vjtL11KSIg"
      },
      "source": [
        "Changing `read_config.interleave_cycle_length`, `read_config.interleave_block_length` will change the examples order.\n",
        "\n",
        "TFDS relies on [tf.data.Dataset.interleave](https://www.tensorflow.org/api_docs/python/tf/data/Dataset?version=nightly#interleave) to only load a few shards at once, improving the performance and reducing memory usage.\n",
        "\n",
        "The example order is only guaranteed to be the same for a fixed value of interleave args. See [interleave doc](https://www.tensorflow.org/api_docs/python/tf/data/Dataset?version=nightly#interleave) to understand what `cycle_length` and `block_length` correspond too.\n",
        "\n",
        "* `cycle_length=16`, `block_length=16`   (default, same as above):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vMq50jt6KRY-"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1251, 1252, 1253, 1254]\n"
          ]
        }
      ],
      "source": [
        "print_ex_ids(imagenet, split='train', take=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pjdo3ExfT7vw"
      },
      "source": [
        "* `cycle_length=3`, `block_length=2`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mrE-qErdmxAi"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0, 1, 1251, 1252, 2502, 2503, 2, 3, 1253, 1254, 2504, 2505, 4, 5, 1255, 1256, 2506, 2507, 6, 7]\n"
          ]
        }
      ],
      "source": [
        "read_config = tfds.ReadConfig(\n",
        "    interleave_cycle_length=3,\n",
        "    interleave_block_length=2,\n",
        ")\n",
        "print_ex_ids(imagenet, split='train', read_config=read_config, take=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AGsbzwRXS3LR"
      },
      "source": [
        "In the second example, we see that the dataset read 2 (`block_length=2`) examples in a shard, then switch to the next shard. Every 2 * 3 (`cycle_length=3`) examples, it goes back to the first shard (`shard0-ex0, shard0-ex1, shard1-ex0, shard1-ex1, shard2-ex0, shard2-ex1, shard0-ex2, shard0-ex3, shard1-ex2, shard1-ex3, shard2-ex2,...`).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8WHS1DRgJ1W8"
      },
      "source": [
        "### Subsplit and example order"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P4O3cTBBCV8q"
      },
      "source": [
        "Each example has an id `0, 1, ..., num_examples-1`. The [subsplit API](https://www.tensorflow.org/datasets/splits) select a slice of examples (e.g. `train[:x]` select `0, 1, ..., x-1`).\n",
        "\n",
        "However, within the subsplit, examples are not read in increasing id order (due to shards and interleave).\n",
        "\n",
        "More specifically, `ds.take(x)` and `split='train[:x]'` are **not** equivalent !\n",
        "\n",
        "This can be seen easily in the above interleave example where examples come from different shards."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7afoTz2XCEFv"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259]\n",
            "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]\n"
          ]
        }
      ],
      "source": [
        "print_ex_ids(imagenet, split='train', take=25)  # tfds.load(..., split='train').take(25)\n",
        "print_ex_ids(imagenet, split='train[:25]', take=-1)  # tfds.load(..., split='train[:25]')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B_e-lAnkSvSX"
      },
      "source": [
        "After the 16 (block_length) examples, `.take(25)` switches to the next shard while `train[:25]` continue reading examples in from the first shard."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EZ4RWjOvbLEc"
      },
      "source": [
        "## Recipes"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7Vf0Qg2eVjrH"
      },
      "source": [
        "### Get deterministic file shuffling\n",
        "\n",
        "There are 2 ways to have deterministic shuffling:\n",
        "\n",
        "1. Setting the `shuffle_seed`. Note: This requires changing the seed at each epoch, otherwise shards will be read in the same order between epoch."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ii0lhSSTYQ9-"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[176411, 176412, 176413, 176414, 176415, 176416, 176417, 176418, 176419, 176420, 176421, 176422, 176423, 176424, 176425, 176426, 710647, 710648, 710649, 710650, 710651, 710652]\n",
            "[176411, 176412, 176413, 176414, 176415, 176416, 176417, 176418, 176419, 176420, 176421, 176422, 176423, 176424, 176425, 176426, 710647, 710648, 710649, 710650, 710651, 710652]\n"
          ]
        }
      ],
      "source": [
        "read_config = tfds.ReadConfig(\n",
        "    shuffle_seed=32,\n",
        ")\n",
        "\n",
        "# Deterministic order, different from the default shuffle_files=False above\n",
        "print_ex_ids(imagenet, split='train', shuffle_files=True, read_config=read_config, take=22)\n",
        "print_ex_ids(imagenet, split='train', shuffle_files=True, read_config=read_config, take=22)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kaMHOCAMVw2A"
      },
      "source": [
        "2. Using `experimental_interleave_sort_fn`: This gives full control over which shards are read and in which order, rather than relying on `ds.shuffle` order."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WMylp8UmZSSr"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[1279916, 1279917, 1279918, 1279919, 1279920]\n"
          ]
        }
      ],
      "source": [
        "def _reverse_order(file_instructions):\n",
        "  return list(reversed(file_instructions))\n",
        "\n",
        "read_config = tfds.ReadConfig(\n",
        "    experimental_interleave_sort_fn=_reverse_order,\n",
        ")\n",
        "\n",
        "# Last shard (01023-of-01024) is read first\n",
        "print_ex_ids(imagenet, split='train', read_config=read_config, take=5)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HUFRWRa1V28p"
      },
      "source": [
        "### Get deterministic preemptable pipeline\n",
        "\n",
        "This one is more complicated. There is no easy, satisfactory solution.\n",
        "\n",
        "1. Without `ds.shuffle` and with deterministic shuffling, in theory it should be possible to count the examples which have been read and deduce which examples have been read within in each shard (as a function of\n",
        "`cycle_length`, `block_length` and shard order). Then the `skip`, `take` for\n",
        "each shard could be injected through `experimental_interleave_sort_fn`.\n",
        "\n",
        "2. With `ds.shuffle` it's likely impossible without replaying the full training pipeline. It would require saving the\n",
        "`ds.shuffle` buffer state to deduce which examples have been read. Examples\n",
        "could be non-continuous (e.g. `shard5_ex2`, `shard5_ex4` read but not `shard5_ex3`).\n",
        "\n",
        "3.  With `ds.shuffle`, one way would be to save all shards_ids/example_ids read (deduced from `tfds_id`), then deducing the file instructions from that.\n",
        "\n",
        "The simplest case for `1.` is to have `.skip(x).take(y)` match `train[x:x+y]` match. It requires:\n",
        "\n",
        "* Set `cycle_length=1` (so shards are read sequentially)\n",
        "* Set `shuffle_files=False`\n",
        "* Do not use `ds.shuffle`\n",
        "\n",
        "It should only be used on huge dataset where the training is\n",
        "only 1 epoch. Examples would be read in the default shuffle order."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UP3jmvZPfrGf"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61]\n",
            "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61]\n"
          ]
        }
      ],
      "source": [
        "read_config = tfds.ReadConfig(\n",
        "    interleave_cycle_length=1,  # Read shards sequentially\n",
        ")\n",
        "\n",
        "print_ex_ids(imagenet, split='train', read_config=read_config, skip=40, take=22)\n",
        "# If the job get pre-empted, using the subsplit API will skip at most `len(shard0)`\n",
        "print_ex_ids(imagenet, split='train[40:]', read_config=read_config, take=22)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tKw9kG6SaT2E"
      },
      "source": [
        "### Find which shards/examples are read for a given subsplit\n",
        "\n",
        "With the `tfds.core.DatasetInfo`, you have direct access to the read instructions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "caqarAYkafEo"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[FileInstruction(filename='imagenet2012-train.tfrecord-00450-of-01024', skip=700, take=-1, num_examples=551),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00451-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00452-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00453-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00454-of-01024', skip=0, take=-1, num_examples=1252),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00455-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00456-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00457-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00458-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00459-of-01024', skip=0, take=-1, num_examples=1251),\n",
              " FileInstruction(filename='imagenet2012-train.tfrecord-00460-of-01024', skip=0, take=1001, num_examples=1001)]"
            ]
          },
          "execution_count": 48,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "imagenet.info.splits['train[44%:45%]'].file_instructions"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "FN7k9-TsMICZ"
      ],
      "name": "determinism.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
